Convolutional neural network for automatic maxillary sinus segmentation on cone-beam computed tomographic images

An accurate three-dimensional (3D) segmentation of the maxillary sinus is crucial for multiple diagnostic and treatment applications. Yet, it is challenging and time-consuming when manually performed on a cone-beam computed tomography (CBCT) dataset. Recently, convolutional neural networks (CNNs) have proven to provide excellent performance in the field of 3D image analysis. Hence, this study developed and validated a novel automated CNN-based methodology for the segmentation of maxillary sinus using CBCT images. A dataset of 264 sinuses were acquired from 2 CBCT devices and randomly divided into 3 subsets: training, validation, and testing. A 3D U-Net architecture CNN model was developed and compared to semi-automatic segmentation in terms of time, accuracy, and consistency. The average time was significantly reduced (p-value < 2.2e−16) by automatic segmentation (0.4 min) compared to semi-automatic segmentation (60.8 min). The model accurately identified the segmented region with a dice similarity co-efficient (DSC) of 98.4%. The inter-observer reliability for minor refinement of automatic segmentation showed an excellent DSC of 99.6%. The proposed CNN model provided a time-efficient, precise, and consistent automatic segmentation which could allow an accurate generation of 3D models for diagnosis and virtual treatment planning.

www.nature.com/scientificreports/ The manual segmentation of the maxillary sinus on CBCT images is time-consuming and dependent on the practitioner's experience with high inter-and intra-observer variability 18 . Other techniques, such as semiautomatic segmentation improve the segmentation efficiency, yet it still requires manual adjustments that can also induce error 10,19 . Recently, artificial intelligence (AI) technologies have started to play a growing role in the field of dentomaxillofacial radiology 20,21 . In particular, deep learning algorithms have gained much attention in the medical field for their ability to handle large and complex data, extract useful information and allow automatic learning of feature hierarchies such as edges, shapes and corners 22 .
Convolutional neural network (CNN) is one of the deep learning approaches that has shown an excellent performance in the field of image analysis. It uses multi-layer neural computational connections for image processing tasks such as classification and segmentation 22 . The application of CNN for CBCT image segmentation could overcome the challenges associated with the other techniques by providing an efficient and consistent segmentation tool, while keeping the anatomical accuracy. Therefore, the aim of this study was to develop and validate a novel automated CNN-based methodology for the segmentation of maxillary sinus on CBCT images.

Materials and methods
This study was conducted in accordance with the standards of the Helsinki Declaration on medical research. Institutional ethical committee approval was obtained from the Ethical Review Board of the University Hospitals Leuven (reference number: S57587). Informed consent was not required as patient-specific information was anonymized. The study plan and report followed the recommendations of Schwendicke et al. 23 for reporting on artificial intelligence in dental research.

Dataset.
A sample of 132 CBCT scans (264 sinuses,75 females and 57 males, mean age 40 years) from 2013 to 2021 with different scanning parameters was collected (Table 1). Inclusion criteria were patients with permanent dentition and maxillary sinus with/without mucosal thickening (shallow > 2 mm, moderate > 4 mm) and/ or with semi-spherical membrane in one of the walls 24 . Scans having dental restorations, orthodontic brackets and implants were also included. The exclusion criteria were patients with a history of trauma, sinus surgery and presence of pathologies affecting its contour.
The Digital Imaging and Communication in Medicine (DICOM) files of the CBCT images were exported anonymously. Dataset was further randomly divided into three subsets: (1) training set (n = 83 scans) for training of the CNN model based on the ground truth; (2) validation set (n = 19 scans) for evaluation and selection of the best model; (3) testing set (n = 30 scans) for testing the model performance by comparison with ground truth.
Ground truth labelling. The ground truth datasets for training and testing of the CNN model were labelled by semi-automatic segmentation of the sinus using Mimics Innovation Suite (version 23.0, Materialise N.V., Leuven, Belgium). Initially, a custom threshold leveling was adjusted between [− 1024 to − 200 Hounsfield units (HU)] to create a mask of the air (Fig. 1a). Subsequently, the region of interest (ROI) was isolated from the rest of the surrounding structures. A manual delineation of the bony contours was performed using eclipse and livewire function, and all contours were checked in coronal, axial, and sagittal orthogonal planes (Fig. 1b). To avoid any inconsistencies in the ROI of different images, the segmentation region was limited to the early start of the sinus ostium from the sinus side before continuation into the infundibulum (Fig. 1b). Finally, the edited mask of each sinus was exported separately as a standard tessellation language (STL) file. The segmentation was performed by a dentomaxillofacial radiologist (NM) with seven years of experience and subsequently re-assessed by two other radiologists (KFV&RJ) with 15 and 25 years of experience respectively. CNN model architecture and training. Two 3D U-Net architecture were used 25 , both of which consisted of 4 encoder and 3 decoder blocks, 2 convolutions with a kernel size of 3 × 3 × 3, followed by a rectified linear unit (ReLU) activation and group normalization with 8 feature maps 26 . Thereafter, max pooling with kernel size 2 × 2 × 2 by strides of two was applied after each encoder, allowing reduction of the resolution with a factor 2 in for each voxel n with ground truth value y n = 0 or 1, and the predicted probability of the network = p n A two-step pre-processing of the training dataset was applied. First, all scans were resampled at the same voxel size. Thereafter, to overcome the graphics processing unit (GPU) memory limitations, the full-size scan was down sampled to a fixed size.
The first 3D U-Net was used to provide roughly low-resolution segmentation for proposing 3D patches and cropped only those which belonged to the sinus. Later, those relevant patches were transferred to the second 3D U-Net where they were individually segmented and combined to create the full resolution segmentation map. Finally, binarization was applied and only the largest connected part was kept, followed by application of a marching cubes algorithm on the binary image. The resultant mesh was smoothed to generate a 3D model (Fig. 2).
The model parameters were optimized with ADAM 27 (an optimization algorithm for training deep learning models) having an initial learning rate of 1.25e−4. During training, random spatial augmentations (rotation, scaling, and elastic deformation) were applied. The validation dataset was used to define the early stopping which indicates a saturation point of the model where no further improvement can be noticed by the training set and more cases will lead to data overfitting. The CNN model was deployed to an online cloud-based platform called virtual patient creator (creator.relu.eu, Relu BV, Version October 2021) where users could upload DICOM dataset and obtain an automatic segmentation of the desired structure.
Testing of AI pipeline. The testing of the CNN model was performed by uploading DICOM files from the test set to the virtual patient creator platform. The resulting automatic segmentation (Fig. 3) could be later downloaded in DICOM or STL file format. For clinical evaluation of the automatic segmentation, the authors developed the following classification criteria: A-perfect segmentation (no refinement was needed), B-very good segmentation (refinements without clinical relevance, slight over or under segmentation in regions other than the maxillary sinus floor), C-good segmentation (refinements that have some clinical relevance, slight over or under segmentation in the maxillary sinus floor region), D-deficient segmentation (considerable over or under segmentation, independent of the sinus region, with necessary repetition) and E-negative (the CNN model could not predict anything). Two observers (NM and KFV) evaluated all the cases, followed by an expert consensus (RJ). In cases where refinements were required, the STL file was imported into Mimics software and edited using the 3D tools tab. The resulting segmentation was denoted as refined segmentation.
L BCE = y n * log p n + 1 − y n * log 1 − p n www.nature.com/scientificreports/ Evaluation metrics. The evaluation metrics 28,29 are outlined in Table 2. The comparison of outcome amongst the ground truth and automatic and refined segmentation was performed by the main observer on the whole testing set. A pilot of 10 scans were tested at first, which showed a Dice similarity coefficient (DSC) of 0.985 ± 004, Intersection over union (IoU) of 0.969 ± 0.007 and 95% Hausdorff Distance (HD) of 0.204 ± 0.018 mm. Based on these findings, the sample size of the testing set was increased up to 30 scans according to the central limit theorem (CLT) 30 .
Time efficiency. The time required for the semi-automatic segmentation was calculated starting from opening the DICOM files in Mimics software till export of the STL file. For automatic segmentation, the algorithm automatically calculated the time required to have a full resolution segmentation. The time for the refined segmen-   Table 2.
Consistency. Once the CNN model is trained it is deterministic; hence it was not evaluated for consistency. For illustration, one scan was uploaded twice on the platform and the resultant STLs were compared. Intra-and inter-observer consistency were calculated for the semi-automatic and refined segmentation. The intra-observer reliability of the main observer was calculated by re-segmenting 10 scans from the testing set with different protocols. For the inter-observer reliability, two observers (NM and KFV) performed the needed refinements, then the STL files were compared with each other.
Statistical analysis. Data were analyzed with RStudio: Integrated Development Environment for R, version 1.3.1093 (RStudio, PBC, Boston, MA). Mean and standard deviation was calculated for all evaluation metrics. A paired-sample t-test was performed with a significance level (p < 0.05) to compare timing required for semi-automatic and automatic segmentation of the testing set.

Results
Time efficiency. The average time required for the semi-automatic segmentation was 60.8 min (3649.8 s) and 24.4 s for automatic segmentation, showing a significant reduction (p-value < 2.2e−16). Considering the refined data, around 30% of the testing set needed refinements (20% class B, 10% class C, no class D and E) with an average refinement time of 7.1 min (422.84 s). The automatic and refined segmentations were approximately 149 and 9 times faster than the semi-automatic segmentation, respectively. Table 3 provides an overview of the accuracy metrics for automatic segmentation. Overall, the automatic segmentation showed a DSC of 98.4% and RMS of 0.21 mm in comparison to the ground truth, implying that the 3D volumes and models along with the surfaces were closely matched between them. (Fig. 4). The comparison between automatic and refined segmentations showed a DSC of 99.6% and RMS of 0.21 mm indicating perfect overlap between them. The minimal difference meant that minor refinements were needed. Table 4 shows the metrics for intra-and inter-observer reliability with a DSC of 98.4% and 99.6% respectively. For the CNN model test-retest reliability, it had by default an identical match with a DSC value of 100%.

Discussion
CBCT imaging has been widely employed in the field of oral and maxillofacial radiology for the visualization of orofacial structures, pre-surgical planning and follow-up assessment [11][12][13] . It allows for a 3D evaluation that is crucial for an accurate diagnosis and management of certain pathologies affecting the maxillofacial complex. Volumetric (3D) assessment of the maxillary sinus not only enhances the diagnostic process but also permits creation of reconstructed virtual models for presurgical planning purposes including implant placement, sinus floor elevation, removal of (impacted) posterior teeth and/or root remnants, reconstructive and orthognathic surgical procedures. In this sense, an accurate segmentation of the sinus cavity is an essential step.
Manual segmentation is not a feasible task in a daily clinical practice since it is a time-consuming task and requires high operator experience. Semi-automatic segmentation techniques still require operator intervention for manual threshold selection. Additionally, the manual adjustments of segmented structures also require a considerable amount of time and may induce operator-based errors 31 . For overcoming the above-mentioned Table 2. Metrics used for assessing accuracy and consistency.

Metric Legend Formula
Dice similarity coefficient (DSC) Represents the overlap of voxels between volume X and volume Y divided by the total number of voxels in both of them. A DSC of 1 indicates complete overlap

Intersection over Union (IoU)
Represents also the overlap of voxels between volume X and volume Y divided by their union. An IoU of 1 means a perfect overlapping segmentation

95% Hausdorff distance (HD)
Represents the maximal distance between all pairs of voxels of volume X and volume Y. A HD of 0 mm indicates a perfect segmentation 95th percentile is used to eliminate the impact of a very small subset of outliers d Hausdorff (X, Y ) = max sup xǫX inf yǫY d x, y , sup yǫY inf xǫX d x, y

Root mean square distance (RMS)
Measures the imperfections of the fit between two surfaces in mm. An RMS of 0 mm indicates perfect match www.nature.com/scientificreports/ limitations and to provide a reproducible and consistent technique, the present study aimed to develop and validate a novel automated maxillary sinus segmentation methodology on CBCT images using a CNN-based model.
The model in the current study was trained using data acquired by 2 CBCT devices (NewTom VGi evo and 3D Accuitomo 170) with different scanning parameters. Furthermore, images both with and without metal artifacts were included for increasing its robustness. A comparison was performed between the CBCT devices by using the CNN model versus the ground truth, and no significant differences were observed. Both devices showed a high DSC value of 98.37% (NewTom VGi evo) and 98.43% (3D Accuitomo 170). Hence, the whole dataset was treated as one sample.
When comparing the performance of the automatic versus the semi-automatic technique, the CNN-model showed remarkable results in relation to time, accuracy and consistency. The automatic segmentation was approximately 149 times faster (24.4 s) than the semi-automatic approach (60.8 min). When considering all the evaluation metrics, the CNN model showed a high similarity to the ground truth (see Table 3).
Based on the proposed classification for the clinical evaluation of automatic segmentation, almost 70% of the testing set was classified as perfect segmentation (class A), with no refinements required. For cases classified as B or C, refinements were mainly associated with cases having mucosal thickening. No deficient or negative Table 3. Accuracy assessment of automatic segmentation. DSC dice similarity coefficient, IoU intersection over union, HD hausdorff distance, RMS root mean square, SD standard deviation, Min minimal value, Max maximal value.  www.nature.com/scientificreports/ predictions were present. Moreover, the small difference between automatic and refined segmentations (see Table 3) suggested that minimal refinements were needed. The inter-observer reliability for the refined segmentation showed a DSC of 99.6% which implied consistency amongst observers. The models' performance was also 100% consistent during repeated segmentation of the same case which is a great advantage to overcome human variability. As the human performance will always be variable each time a segmentation is performed. Additionally, the developed model was fully automatic without the need for any human intervention which also overcomes the issues of threshold leveling and grey scale variability.
To date, few researchers 32-34 have investigated maxillary sinus segmentation from CBCT datasets with different study designs. Bui et al. 32 investigated an automatic segmentation technique of the paranasal sinuses and the nasal cavity from 10 CBCT images. They applied a multi-step level coarse to fine active contour modelling and reported a dice of 95.7% in comparison to manual segmentation by considering experts as a ground truth. Neelapu et al. 33 developed a knowledge-based algorithm for automatically segmenting the maxillary sinus from 15 CBCT imaging scans. The authors compared five segmentation techniques following automatic contour initialization and reported a dice ranging between 80-90% for all the segmentation methods. Ham et al. 34 proposed an automatic maxillary sinus segmentation technique using one 3D U-Net and found a DSC score of 92.8%. Even though a comparison with the aforementioned studies was difficult due to the variability in relation to CBCT devices, scanning protocol and study design, the currently proposed CNN model in the current study showed better results considering the metrics evaluated. Furthermore, the time needed for each segmentation method was clearly stated and sample size was justified, which have been rarely reported in the previous studies. Recent studies 35,36 have reported on automatic segmentation of sinus mucosal thickening and pathological lesions, yet this was not the focus of our study.
The limitations of this study were similar to the already present challenges of artificial intelligence in dentistry 21,37 . Firstly, lack of data heterogeneity and model generalizability exists, which could be solved by incorporating data from different CBCT devices having variable scanning parameters. Secondly, the online platform only allowed visualization and export of the automatic segmentation, and a third-party software was required for performing the refinements. Recently, some editing tools have been added to the platform and additional features will be added soon to overcome this issue. Finally, the CNN model enabled to extract the normal clear sinus and separate the bony borders in cases with sinus thickening, however, it cannot delineate the soft tissue. Future work will focus on the pathological conditions of the maxillary sinus.

Conclusions
A novel 3D U-Net architecture CNN model was developed and validated for automatic segmentation and 3D virtual model creation of the maxillary sinus from CBCT imaging. Owing to its promising performance in relation to time, accuracy and consistency, it can represent a solid base for future studies by incorporation of pathological conditions. An additional benefit of the model is the deployment to an online web-based user-interactive platform which could facilitate its application in clinical practice.